{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3o8Qof7Cy165"
   },
   "source": [
    "# LAB 3c:  BigQuery ML Model Deep Neural Network.\n",
    "\n",
    "## Learning Objectives\n",
    "\n",
    "1. Create and evaluate DNN model with BigQuery ML.\n",
    "1. Create and evaluate DNN model with feature engineering with ML.TRANSFORM.\n",
    "1. Calculate predictions with BigQuery's ML.PREDICT.\n",
    "\n",
    "\n",
    "## Introduction \n",
    "In this notebook, we will create multiple deep neural network models to predict the weight of a baby before it is born, using first no feature engineering and then the feature engineering from the previous lab using BigQuery ML.\n",
    "\n",
    "We will create and evaluate a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculate predictions with BigQuery's ML.PREDICT. If you need a refresher, you can go back and look how we made a baseline model in the notebook [BQML Baseline Model](../solutions/3a_bqml_baseline_babyweight.ipynb) or how we combined linear models with feature engineering in the notebook [BQML Linear Models with Feature Engineering](../solutions/3b_bqml_linear_transform_babyweight.ipynb).\n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/3c_bqml_dnn_babyweight.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hJ7ByvoXzpVI"
   },
   "source": [
    "## Load necessary libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "mC9K9Dpx1ztf"
   },
   "source": [
    "Check that the Google BigQuery library is installed and if not, install it. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 609
    },
    "colab_type": "code",
    "id": "RZUQtASG10xO",
    "outputId": "5612d6b0-9730-476a-a28f-8fdc14f4ecde"
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "pip freeze | grep google-cloud-bigquery==1.6.1 || \\\n",
    "pip install google-cloud-bigquery==1.6.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "clnaaqQsXkwC"
   },
   "source": [
    "## Verify tables exist\n",
    "\n",
    "Run the following cells to verify that we have previously created the dataset and data tables. If not, go back to lab [1b_prepare_data_babyweight](../solutions/1b_prepare_data_babyweight.ipynb) to create them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "-- LIMIT 0 is a free query; this allows us to check that the table exists.\n",
    "SELECT * FROM babyweight.babyweight_data_train\n",
    "LIMIT 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "-- LIMIT 0 is a free query; this allows us to check that the table exists.\n",
    "SELECT * FROM babyweight.babyweight_data_eval\n",
    "LIMIT 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #1: Model 4:  Increase complexity of model using DNN_REGRESSOR\n",
    "\n",
    "DNN_REGRESSOR is a new regression model_type vs. the LINEAR_REG that we have been using in previous labs.\n",
    "\n",
    "* MODEL_TYPE=\"DNN_REGRESSOR\"\n",
    "\n",
    "* hidden_units: List of hidden units per layer; all layers are fully connected. Number of elements in the array will be the number of hidden layers. The default value for hidden_units is [Min(128, N / (𝜶(Ni+No)))] (1 hidden layer), with N the training data size, Ni, No the input layer and output layer units, respectively, 𝜶 is constant with value 10. The upper bound of the rule will make sure the model won’t be over fitting. Note that, we currently have a model size limitation to 256MB.\n",
    "\n",
    "* dropout: Probability to drop a given coordinate during training; dropout is a very common technique to avoid overfitting in DNNs. The default value is zero, which means we will not drop out any coordinate during training.\n",
    "\n",
    "* batch_size: Number of samples that will be served to train the network for each sub iteration. The default value is Min(1024, num_examples) to balance the training speed and convergence. Serving all training data in each sub-iteration may lead to convergence issues, and is not advised."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create DNN_REGRESSOR model\n",
    "\n",
    "Change model type to use DNN_REGRESSOR, add a list of integer HIDDEN_UNITS, and add an integer BATCH_SIZE.\n",
    "* Hint:  Create a model_4.\n",
    "Note: Model creation takes around 40-50 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "CREATE OR REPLACE MODEL\n",
    "    babyweight.model_4\n",
    "OPTIONS (\n",
    "    # TODO: Add DNN options\n",
    "    INPUT_LABEL_COLS=[\"weight_pounds\"],\n",
    "    DATA_SPLIT_METHOD=\"NO_SPLIT\") AS\n",
    "\n",
    "SELECT\n",
    "    # TODO: Add base features and label\n",
    "FROM\n",
    "    babyweight.babyweight_data_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "AVPXGKZ374v7"
   },
   "source": [
    "### Get training information and evaluate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first look at our training statistics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.model_4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's evaluate our trained model on our eval dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.model_4,\n",
    "    (\n",
    "    SELECT\n",
    "        weight_pounds,\n",
    "        is_male,\n",
    "        mother_age,\n",
    "        plurality,\n",
    "        gestation_weeks\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use our evaluation's `mean_squared_error` to calculate our model's RMSE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    SQRT(mean_squared_error) AS rmse\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.model_4,\n",
    "    (\n",
    "    SELECT\n",
    "        weight_pounds,\n",
    "        is_male,\n",
    "        mother_age,\n",
    "        plurality,\n",
    "        gestation_weeks\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #2: Final Model:  Apply the TRANSFORM clause\n",
    "\n",
    "Before we perform our prediction, we should encapsulate the entire feature set in a TRANSFORM clause as we did in the last notebook. This way we can have the same transformations applied for training and prediction without modifying the queries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's apply the TRANSFORM clause to the final model and run the query.\n",
    "Note: Model creation takes around 40-50 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "CREATE OR REPLACE MODEL\n",
    "    babyweight.final_model\n",
    "\n",
    "TRANSFORM(\n",
    "    weight_pounds,\n",
    "    is_male,\n",
    "    mother_age,\n",
    "    plurality,\n",
    "    gestation_weeks,\n",
    "    # TODO: Add FEATURE CROSS of:\n",
    "    # is_male, bucketed_mother_age, plurality, and bucketed_gestation_weeks\n",
    "\n",
    "OPTIONS (\n",
    "    # TODO: Add DNN options\n",
    "    INPUT_LABEL_COLS=[\"weight_pounds\"],\n",
    "    DATA_SPLIT_METHOD=\"NO_SPLIT\") AS\n",
    "\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    babyweight.babyweight_data_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first look at our training statistics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.final_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's evaluate our trained model on our eval dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        *\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use our evaluation's `mean_squared_error` to calculate our model's RMSE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    SQRT(mean_squared_error) AS rmse\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        *\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DIuFS56o_F2a"
   },
   "source": [
    "## Lab Task #3: Predict with final model.\n",
    "\n",
    "Now that you have evaluated your model, the next step is to use it to predict the weight of a baby before it is born, using BigQuery `ML.PREDICT` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predict from final model using an example from original dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "29cuS1CbADE3"
   },
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.PREDICT(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        # TODO Add base features example from original dataset\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modify above prediction query using example from simulated dataset\n",
    "\n",
    "Use the feature values you made up above, however set is_male to \"Unknown\" and plurality to \"Multiple(2+)\". This is simulating us not knowing the gender or the exact plurality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.PREDICT(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        # TODO Add base features example from simulated dataset\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Summary: \n",
    "In this lab, we created and evaluated a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculated predictions with BigQuery's ML.PREDICT."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2022 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "10.3.2019-DRAFT-_1 - FeatEnG - LABS.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
